-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: warehouse transformer #5205
base: master
Are you sure you want to change the base?
Conversation
a941966
to
22b287b
Compare
22b287b
to
5ae4838
Compare
1a41ba4
to
e414cca
Compare
404773f
to
c798625
Compare
9f36c96
to
3f956b4
Compare
11996b9
to
14aa043
Compare
6934017
to
dcc7800
Compare
dcc7800
to
a527f87
Compare
a527f87
to
546af2a
Compare
"data": data, | ||
"metadata": map[string]any{ | ||
"table": table, | ||
"columns": columns, | ||
"receivedAt": tec.event.Metadata.ReceivedAt, | ||
}, | ||
"userId": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This map is being created across multiple functions. Should we consider introducing stricter typing by replacing it with a struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can create a struct with the following information data, metadata and userID. But then later again it will be used to populate the map which is basically TransformerResponse.Output
. I don't see much value in here with defining the struct.
whutils "github.com/rudderlabs/rudder-server/warehouse/utils" | ||
) | ||
|
||
func (t *Transformer) trackEvents(tec *transformEventContext) ([]map[string]any, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should define these methods on the transformEventContext
struct instead of passing tec
as an argument in all the functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transformEventContext
is a data model. I don't think we should have these methods set on a data model.- Also, if we do that, then we would need to pass all other information like config and logger as well if we want to get some configuration or log things.
if err != nil { | ||
return "", err | ||
} | ||
tec.cache.safeTableNameCache.Store(cacheKey, tableName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of creating a cache ? Does the "transform" function involve heavy computation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's not heavy computation, but it still involves a couple of string manipulations, which might be crucial if the string is large. For a particular key, since these don't change, it should be fine to have a cache here. If we get a batch of 1000 events, each with 100 keys and 100K properties, these small improvements will greatly help.
// transformColumnName applies transformation to the input column name based on the destination type and configuration options. | ||
// If `useBlendoCasing` is enabled, it transforms the column name into Blendo casing. | ||
// Otherwise, it applies a more general transformation using the `transformName` function. | ||
func transformColumnName(destType string, intrOpts *intrOptions, destOpts *destOptions, columnName string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transformColumnName
shares almost all the the code with the transformTableName
function. Should we extract the common code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too much of an abstraction to get this common code to a separate function.
require.NotEmpty(t, wResponse.FailedEvents[i].Error) | ||
|
||
require.NotZero(t, pResponse.FailedEvents[i].StatusCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also check that Error
and StatusCode
are same in pResponse
and wResponse
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the error string might be different. We can't actually compare the error string, Javascript handling of error is different from GO handling. So we are just comparing if we received failed events or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. What about the status code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of exceptions in Javascript, we are just returning the internal error, but we are handling those errors appropriately in GO code.
It should be fine as we are mostly concerned with the Output.
1654eca
to
45dcaa9
Compare
45dcaa9
to
964c7c0
Compare
964c7c0
to
b8089e9
Compare
b8089e9
to
8897f5d
Compare
Description
Processor.enableWarehouseTransformations
.Linear Ticket
Security